Design Principles for Cooperation between Modalities in Bi-directional Multimodal Interfaces

نویسنده

  • Stéphanie Buisine
چکیده

In this article, after briefly reviewing some previous work on the design of multimodal HCI, we present two projects, the first results obtained from experimental studies, and draw conclusions on design principles for multimodal interfaces. ON THE DESIGN OF MULTIMODAL HCI Classical design principles for HCI (e.g. Mayhew, 1999) recommend conducting iterative evaluations at different stages in the design process. However, evaluations and available guidelines mainly focus on output characteristics, because input devices are often a priori fixed as mouse and keyboard. In multimodal interfaces design, evaluations must deal with both input and output devices, and test reciprocal influences they have on each other. Concerning input from the user, potential usefulness of multimodality has been shown in several studies (see Martin et al., 1998 for a review). Behavioral analysis methods have also been used to categorize types of cooperation between modalities (Martin et al., 1998 & 2001). Some aspects of multimodal output have been studied in the context of HCI with embodied conversational agents (ECA). For example, the relevance of the presence of agents was tested in several applications (Craig et al., 2002; Moreno et al., 2001) and influence of agents’ properties on different subjective variables was assessed (Granström et al., 2002; Koda & Maes, 1996; McBreen et al., 2001; McBreen & Jack, 2000 & 2001; Wonish & Cooper, 2002). Finally, general principles underlying the development of multimodal HCI were also described (Benoit et al., 2000; Oviatt, 2002). In the next section, we briefly present two projects for which we carried out experimental studies. The first results we obtained constitute a basis for formulating a few additional principles. ILLUSTRATIVE PROJECTS The IST-NICE (Natural Interactive Communication for Edutainment) project is aimed at conceiving a conversational game for children and adolescents based on multimodal input (speech and gesture) and multimodal output (embodied conversational agents). Concerning gesture from the user, a 2D pen input was initially chosen because it seemed likely to meet conversational goals and less constraining than 3D gestural input devices. A Wizardof-Oz experiment was carried out within an experimental methodology framework to collect multimodal behavioral data from the users and test the effectiveness of interaction. Adults and children were videotaped while interacting with 2D animated agents in a game application (Buisine et al., 2002). Each subject performed a multimodal scenario (allowing use of speech and/or pen gesture on the screen) and a speech-only scenario. The results confirm the usefulness for multimodal input whatever the subjects’ age and gender: multimodal scenarios proved to be shorter and rated as being easier than speech-only scenarios. Moreover, multimodality homogenized ratings of easiness across all the participants better than speech-only condition (Buisine & Martin, submitted). Additional results showed that gesture interaction was more important for children than for adults, both in terms of quantity and variety. A factorial analysis also showed that the use of pen by children was associated with high ratings of pleasantness. Finally, analyses combining speech and pen gestures showed that certain commands were mainly performed by pen gestures and others by speech. The collected data are currently being exploited to build the multimodal language model for the design of the real system. 1 http://www.niceproject.com The RNRT-iTV 2 (Interactive Television) project is intended to develop an interactive television interface including multimodal input. In this respect, both speech and pen input seem likely to provide direct designations of items. To test this hypothesis and the usefulness of multimodality in input, a Wizard-of-Oz experiment was held in which participants had to perform TV-program search scenarios within three modality conditions: interaction by speech input, by pen input, and multimodal interaction (speech and pen input). The output device consisted of a web site including text and graphics, to which verbal error messages had been added. Preliminary analyses of the obtained behavioral corpus showed that the syntax of verbal commands was very simple: subjects used the labels displayed on the interface and did not build complex sentences. In the multimodal condition, most of the subjects used only one of the two modalities and appreciated choosing it in accordance with their preferences. SUGGESTED DESIGN PRINCIPLES From the results obtained in these two experimental studies, we propose the following principles: • Enable the use of the same modalities in input and output: The symmetry principle for speech (speech must be bi-directional) could be transferred to multimodality. For example, we observed that interaction with conversational agents could benefit from gestural input, particularly for children (Buisine & Martin, submitted). When users interact with an ECA both in input and output by speech, gesture, and sometimes facial expressions or body posture, the symmetry concerns the modality of interaction. But it can also be extended to the characteristics of communication: if the interface makes use of 3D gestures via an ECA, users should also be able to use 3D gestures. Moreover, observed cooperations in the users’ modalities (e.g. redundancy between speech and gesture...) should elicit similar modalities combinations in the ECA. • Use multimodal cues both in input and output to improve speech turns: The use of multimodal cues is likely to improve speech turns. This principle has proved to be relevant in output when the user interacts with an agent (e.g. Cassell & Vilhjalmsson, 1999; Gustafson, 2002), but it could also be adapted to input. Recognition of facial expressions, gaze direction and/or non verbal speech could indicate when the user wants to take or give turn in the conversation (Thórisson, 1999). For example, as we observed with the pen for children, gestural exploration could mean that the 2 http://cpn.paris.ensam.fr/tvi/ user is quite lost and that the system should take the initiative in the dialog. • Use appropriate outputs to induce multimodal input behavior which is easier to process: Appropriate outputs may induce multimodal input behavior which is easier to process. For example, labels displayed in the interface can be spontaneously used in speech input, which will limit vocabulary and thus facilitate vocal recognition. • Use modalities appropriate for the user: Multimodality requires paying even more attention to users’ profile than is normally the case in the design process. For example, age must be taken into account in speech and motor preferences and capabilities. • Adapt the recognition system according to the observed cooperations between modalities: Task analysis and preliminary tests could shed light on “intuitive” cooperation between modalities in a given application. For example in one of our study, modalities appeared to cooperate by specialization (some commands were mainly performed by one modality). In this case, the adaptation of the multimodal recognition system could enhance its effectiveness and robustness. Conversely, for commands in which no specialization arise, the recognition system would allow freedom of choice between modalities and adapt to users preferences. CONCLUSION AND SUGGESTIONS FOR THE WORKSHOP An experimental approach in the context of conception projects is likely to provide both applied and general results. The latter can be exploited in general HCI design specification, as our recommendations to integrate input and output in the design of multimodal HCI. In case of intuitive multimodal interfaces with ECA, multimodality in input should be considered as part of the ECA and not dissociated during either development or evaluation phases. Bidirectionality and simultaneity of communication must be better integrated, for example with speech turns. However, such design principles need to be integrated in a larger framework and confronted with other results. In other respects, principles arising from different protocols and contexts should be classified according, for example, to dimensions of multimodality or goals of designers (in terms of system performance, user satisfaction, etc.). ACKNOWLEDGMENTSThe work described in this paper was financed by the IST-NICE project (http://www.niceproject.com) and the RNRTInteractiveTelevisionproject(http://cpn.paris.ensam.fr/tvi/). The authors wish to thanktheir partners in these projects. REFERENCESBenoit, C., Martin, J.C., Pelachaud, C., Schomaker, L. & Suhm, B. (2000). Audio-Visual and Multimodal Speech Systems. In: D. Gibbon (Ed.). Handbook of Standards and Resources for Spoken Language Systems Supplement

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of a Multimodal System Based on Dialogue Models and Transformations

To provide user interfaces for a rich set of devices and interaction modalities, we follow a model-based development methodology and devised an architecture which deploys user interfaces specified as dialogue models with abstract interaction objects and allows context-based adaptations by means of an external transcoding process. For the validation of the applicability of this methodology for d...

متن کامل

On the Annotation of Multimodal Behavior and Computation of Cooperation Between Modalities

With the success of multimedia and mobile devices, humancomputer interfaces combining several communication modalities such as speech and gesture may lead to more "natural" humancomputer interaction. Yet, developing multimodal interfaces requires an understanding (and thus the observation and analysis) of human multimodal behavior. In the field of annotation of multimodal corpus, there is no st...

متن کامل

Distributed and Cooperative Compressive Sensing Recovery Algorithm for Wireless Sensor Networks with Bi-directional Incremental Topology

Recently, the problem of compressive sensing (CS) has attracted lots of attention in the area of signal processing. So, much of the research in this field is being carried out in this issue. One of the applications where CS could be used is wireless sensor networks (WSNs). The structure of WSNs consists of many low power wireless sensors. This requires that any improved algorithm for this appli...

متن کامل

TYCOON: Theoretical Framework and Software Tools for Multimodal Interfaces

W e define a modality as a process analyzing and producing chunks of information. For instance, a speech recognition modality analyses speech signals and produces the labels of recognized words. Several multimodal interfaces combining such modalities have already been developed (IMMI'95, CMC'95). To take benefit out of them so as to advance research and implementation of multimodal interfaces, ...

متن کامل

Two optimal algorithms for finding bi-directional shortest path design problem in a block layout

In this paper, Shortest Path Design Problem (SPDP) in which the path is incident to all cells is considered. The bi-directional path is one of the known types of configuration of networks for Automated Guided Vehi-cles (AGV).To solve this problem, two algorithms are developed. For each algorithm an Integer Linear Pro-gramming (ILP) is determined. The objective functions of both algorithms are t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003